Reconstruction on Trees: 
Exponential Moment Bounds for Linear Estimators* 

Yuval Peres^^ Sebastien Roch-f 

August 14, 2009 

Abstract 

Consider a Markov chain (^t,)^^^ G [k]^ on the infinite 6-ary tree T = 
{V, E) with irreducible edge transition matrix M, where 6 > 2, fc > 2 and 
[fc] = {1, . . . , k). We denote by i„ the level-rt vertices of T. Assume A/ has 
a real second-largest (in absolute value) eigenvalue A with corresponding real 
eigenvector v 0. Letting (t„ = v^^^ , we consider the following root-state 
estimator, which was introduced by Mossel and Peres (2003) in the context 
of the "recontruction problem" on trees: 

As noted by Mossel and Peres, when feA^ > 1 (the so-called Kesten-Stigum 
reconstruction phase) the quantity 5„ has uniformly bounded variance. Here, 
we give bounds on the moment-generating functions of S'„ and S"^ when 
> 1. Our results have impUcations for the inference of evolutionary 
trees. 

Keywords: Markov chains on trees, reconstruction problem, Kesten-Stigum 
bound, phylogenetic reconstruction 



1 Introduction 

We first state our main theorem. Related results and applications are discussed at 
the end of the section. 

Basic setup. For b > 2, let T = {V, E) be the infinite 6-ary tree rooted at p. 
Denote by r„ the first n > levels of T. Let M = {Mij)ij^-^ be a x A; 
irreducible stochastic matrix with stationary distribution vr > 0. Assume M has a 
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real second-largest (in absolute value) eigenvalue A and let 7^ be a real right 
eigenvector corresponding to A with 



k 



^TTil^f = 1. 



Let [k] = {1, . . . , k}. Consider the following Markov process on T: pick a root 
state in [k] according to vr; moving away from the root, apply the channel M to 
each edge independently. Denote by {Cv)v&v the state assignment so obtained and 
let 



for allv€V 

Reconstruction. In the so-called "reconstruction problem," one seeks — roughly 
speaking — to infer the state at the root from the states at level n, as n ^ 00. This 
problem has been studied extensively in probability theory and statistical physics. 
See e.g. MEKPSOOII for background and references. Here, we are interested in the 
following root-state estimator introduced in IIMP03i For n > 0, let L„ be the 
vertices of T at level n. Consider the following quantity 



that is, Sn is "unbiased." Moreover, it was shown in IIMP03II that in the so-called 
Kesten-Stigum reconstruction phase, that is, when bX^ > 1, it holds that for all 

n > 

maxE[5'^ \£,p = i] <C < +00, 

i 

where C = C(M) is a constant depending only on M (not on n). 
Main results. For n > 0, i = I, . . . ,k, and C S M, let 




(1) 



It is easy to show that for all n > 



ri(o=E[e^^"|ep = *] 



and 



ri(c)=E[e^^?Hep = ^]. 



We prove the following. 
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Theorem 1 (Exponential Moment Bound) Assume M is such that bX^ > 1. Then, 
there is c = c(M) < +00 such that for all n > 0, i = 1, . . . , A;, and C € it 
holds that 

TiiO < e^>^+<' < +00. 
Note that Vi = E[5n \S,p = i]- 

Corollary 1 Assume M is such that bX^ > 1. Then, there is C = C{^) ^ (0, +00) 
and C = C{M) < +00 such that for all n > 0, i = 1, . . . ,k, and C € (— C) 0- 
holds that 

KiC) < c < +00. 

The proofs of Theorem [T] and Corollary [T]can be found in Section |2] 

Related results. Moment-generating functions of random variables similar to (O 
have been studied in the context of multi-type branching processes. In particular, 
Athi^eya and Vidyashankar IIAV95II have obtained large-deviation results for quan- 
tities of the type (in our setting) 

Rn = b~'^Zn ■ W — TT ■ W, 

where u; € M'^ and Z.„ = {Zn \ ■ ■ ■ , zit^ ) is the "census" vector, that is, 

Z« = \{xeLn : C, = i}\, 

for all i G [A;]. However, note that we are interested in the degenerate case w = 
u ± n (see e.g. IIHJ85II ) and our results cannot be deduced from ||AV95i 

Note moreover that our bounds cannot hold when bX^ < 1. Indeed, in that case, 
a classical CLT of Kesten and Stigum IIKS66II for multi-type branching processes 
imphes that the quantity 

converges in distribution to a centered Gaussian with a finite variance (indepen- 
dently of the root state). See IIMP03I for more on the Kesten-Stigum CLT and its 
relation to the reconstruction problem. 



3 



Motivation. The motivation behind our results comes from mathematical biol- 
ogy. More particularly, our main theorem has recently played a role in the solution 
of important questions in mathematical phylogenetics, which we now briefly dis- 
cuss. 

As mentioned above, the quantity Sn arises naturally in the reconstruction 
problem as a simple "linear" estimator of the root state MEKPSOOl IMP03II . In the 
past few years, deep connections have been established between the reconstruction 
problem and the inference of phylogenies — a central problem in computational bi- 
ology MS S 031 lFel04i A phylogeny is a tree representing the evolutionary history 
of a group of organisms, where the leaves are modem species and the branchings 
con^espond to past speciation events. To reconstruct phylogenies, biologists extract 
(aligned) biomolecular sequences from extant species. It is standai^d in evolution- 
ary biology to model such collections of sequences as independent samples from 
the leaves of a Markov chain on a finite tree 

§ = {KWLjf=i, (2) 

where I is the sequence length. The goal of phylogenetics is to infer the leaf- 
labelled tree that generated these samples. In particular, developing reconstruction 
techniques that require as few samples as possible is of practical importance. 

An insightful conjecture of Steel USteOll suggests that the reconstruction of 
phylogenies can be achieved from much shorter sequences when the reconstruc- 
tion problem is "solvable," in particular in the Kesten-Stigum reconstruction phase. 
This conjecture has been established in the binary symmetric case (equivalent to 
the ferromagnetic Ising model), that is, the case k = 2 and M symmetric, by Mos- 
sel IIMos04ll and Daskalakis et al. IIDMR09i The main idea behind these results 
is to "boost" standard tree-building techniques by inferring ancestral sequences. 
See IIMos04[|DMR09l for details. 

Establishing Steel's conjecture under more realistic models of sequence evolu- 
tion (i.e., more general transition matrices M) is a major open problem in mathe- 
matical phylogenetics. Roughly, to reconstruct a phylogeny from samples at level 
n one iteratively joins the most coiTclated pairs of nodes, starting from level n and 
moving towards the root. To estimate the correlation between internal nodes u and 
V on level m < n using only ^ it is natural to consider quantities such as 

^] = 7 E ( E ) ( (^^)"^""™^ E ) ' (3) 

i=l \ x&Ll ) \ x&Ll j 

where is the set of nodes on level n below u. In words, we estimate the corre- 
lation between the reconstructed states at u and v. Proving concentration of such 
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quantities necessitates uniform bounds on the moment-generating functions of 5„ 
and S"^ — our main result. We note in particular that our main theorem was recently 
used by Roch ||Roc09ll . building on MRocOSII . to prove Steel's conjecture for gen- 
eral k and reversible transition matrices of the form M = e**^ in the Kesten-Stigum 
phase. Moreover, this result was established using a surprisingly simple algorithm 
known in phylogenetics as a "distance-based method," thereby contradicting a con- 
jecture regarding the weakness of this widely used class of methods. See MRocOSII 
for background. 

Organization. The proof of our results can be found in Section |2] 

2 Proof 

We first prove our main theorem in a neighbourhood around zero. 

Lemma 1 Assume M is such that h}? > 1. Then, there is c' = c'{M) < +oo and 
Co € (0, +oo) such that for all n > 0, i = 1, . . . ,k, and |C| < Co. it holds that 

ri(C)<e'^'^+'^'^'. 

Proof: We prove the result by induction on n. For n = 0, note that 

so the first step of the induction holds for all c' > and all C G I^- 

Now assume the result holds for n > with d and Co to be determined later. 
Forn > 0, i = 1, . . . , A:, and C G M, let 

7;(C) = inruc). 

Let be the children of p and, for = 1, . . . ,b, denote by L'^_^,i the 

descendants of on the n + I'st level. For uj = 1, . . . , 6, let 

'^"+1 = 7^ E ^- 
Note that conditioned on ^p, the random vectors 

ai"e independent and identically distributed. Hence, the variables 

Cl nb 
'-'n+1' • • • ' 
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are also conditionally independent and identically distributed. Applying the chan- 
nel to the first level of the tree and using the induction hypothesis, we have for 

C G (-Co, Co) 



lnE[e^^"+i l^p = i] 



InE 
felnE 



exp 



b\ 



UJ = 1 
C_ c>l 

hX 



Cp = i 



6 In ^Mi,jE 



exp ( ^Si^i 



where we used that by assumption 



1 



\bX\ > ^ > 1 



Cqi = j 



so that C/{hX) G (—Co, Co)- By a Taylor expansion, as Co goes to zero (in particular 
Co < 1), we have 



7^+1 (C) < c 



6A2 



< c 



+5 In XI 

+feln {l + \vi 
1 



CA , 1 2f C 



^\^^\\ l|2 



b\ 

6A2 2 h 

2 



+ ICI'^ 
+ Oco(ICp) 



< ^iC + {c' + ^||HlL}^ + Oco(ICI 
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Choose c' > large enough so that 
that is, 



2 / 1 



26A2 V feA2 

Note that d is well defined when 6A2 > 1. Then there is Co G (0, +oo) such that 
for all Cg (-Co, Co) 

7^+1(0 <M,C + c'C^ 

That concludes the proof. ■ 

The following lemma deals with values of C away from zero. 

Lemma 2 Assume M is such that bX^ > 1. Let Co G (0, +oo) be as in LemmaU} 
Then, there is c" = c"{M) < +00 such that for all n > 0, i = 1, . . . ,k, and 
ICI ^ Co' it holds that 

Proof: Let c' be as in LemmaH] Let Ci G (0, +00) be such that 

< W 

Choose c" > c' large enough so that 

e-^.C+ce <e^"(\ (5) 

for all Id > Ci for all i = 1, . . . , A;. 

Let n > and C with |C| > Co be fixed. Note that, when we relate the expo- 
nential moment at level m to that at level m — 1 with a recursion as in the proof of 
Lemma[Tl the value of C is effectively divided by 6A. Therefore, there are two cases 
in the proof: either we reach the interval (—Co, Co) by the time we reach m = in 
the recursion; or we do not. 

1 . First assume that 

c 



that is, we do not reach (—Co, Co)- We prove the result by induction on the 
level m = 0, . . . , n. At 771 = 0, we have 

Ti ( — - — 1 = e'^'^TW^^ < e'""^!^^^ 
" \ {bXY 
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by ^ and (l6]l for alH = 1 , . . . , /c. Assume for the sake of the induction that 



c 



{bxy 



c"( i )^ 



for all i = 1, . . . , fc. Using the calculations of Lemma[T] we have 



7m+l 



k 



1 c 



h\ (6A)"-("^+i) 



< 6 In l^Mije Hbxr-"^' 
be" I ^ 



c 



62 A2 V(^A)"-('"+i) 

2 



< c" 



c 



(6A)"-(™+i) 



where we used bA^ > 1 on the last line. The proof of the first case follows 
by induction, that is, we have 

for alH = 1, . . . , fc. 



2. Assume now that 



c 



{bxy 



<Co. 



Let m* be the largest value in 0, . . . , n such that 

c 



(6A)"^ 



<Co. 



(7) 



(8) 



The purpose of Assumption (01) above is to make sure that we never "jump" 
entirely over the subset of (— Coi Co) where holds. Indeed, by Q and 



c 

(6A)"-('"*+i) 



>Co 



(9) 
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it follows that we must also have 



c_ 

Hence, by © and Lemma[T] we get 



> Ci- (10) 



"I L. 



for alH = 1, . . . , /c. The proof then follows by induction as in the first case 
above. 



Proof of TheoremlH Let C,q, d and c" be as in Lemmas [T] and [2l Choose c > c"(> 
c') large enough so that 

gC'C^ <g..C+cC^^ (11) 

for all I CI > Co and for alH = 1, . . . , fc. The result then follows by combining 
Lemmas [T] and |2] ■ 

Proof of Corollary [ij We use a standard trick relating the exponential moment of 
the square to that of a Gaussian. Let X be a standard normal. Using Theorem [U 
and applying Fubini we have for all n > and i = 1, . . . , A; 



The last expectation is finite for ( small enough. 
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